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The statistics of suicide 

Robert D. GIBBONS 



1. Why is suicide so important to study? 

Worldwide, there are about one million suicides 
annually. In the United States (USA) approximately 
750,000 people died by suicide over the last 25 years. 
Suicides outnumber homicides by at least a 3:2 ratio 
in the USA. Deaths from suicide exceeded deaths from 
AIDS by 200000 in the past 20 years. Four times as many 
Americans died by suicide during the Vietnam War 
than from wartime fatalities. [1] More deaths by suicide 
were recorded among American military during the 
recent Iraq and Afghanistan wars than were recorded 
for military related casualties. 121 Nonetheless, suicide is 
a rare event with an annual rate in the US of 12 per 
100 000, making it an extremely difficult phenomenon 
to study using conventional approaches. Suicide is the 
third leading cause of death in adolescents 10 to 14 
years of age in the US and the leading cause of death 
in this age group in several other countries including 
China, Sweden, Ireland, Australia and New Zealand.' 11 

The enormous human cost of suicide in youth 
makes research and prevention a national priority. 
Over 90% of youth suicides in the USA are associated 
with psychiatric illness; 11 ' 3 ' 41 however, only 2% of youth 
suicides were on medication at the time of their 
suicide. 1551 In a study of 49 adolescent suicides in 
Utah State, 24% had been prescribed antidepressants 
but none of them tested positive for antidepressants 
at the time of their death. 171 In a post-mortem study 
conducted on 66 youth suicides in New York City, 151 only 
four had measurable levels of antidepressants (2 with 
imipramine and 2 with fluoxetine). 

Suicide is rare in younger children (less than 
1/100 000 per year in 5- to 14- year-olds 111 ), but it is 
more common after mid-adolescence. The annual 
rates in the US of 15- to 19-year-olds are 3 per 100000 
for girls and 15 per 100 000 for boys. [8] In contrast to 
suicide mortality, suicidal thinking and suicide attempts 
are relatively common: every year, 19% of teenagers 
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15 to 19 years of age in the general USA population 
have suicidal ideation and nearly 9% make a suicide 
attempt. 191 The rate of suicidal behavior is even more 
frequent in youth receiving care for depression; 35 to 
50% have made, or will make, a suicide attempt 110 121 
and between 2 and 8% will die by suicide over the 
decade following their first treatment. 11011131 

2. Why is suicide difficult to study? 

For many reasons, suicide is one of the more 
difficult adverse events to study. First, suicide is a 
rare event so it is generally not possible to study 
completed suicide in RCTs or even in reasonably large 
pharmacoepidemiologic studies. Consequently, the 
suicidal events that form the basis for prevention 
measures (such as the FDA black-box warnings) 
are usually suicidal thoughts, which are far more 
prevalent than suicide completions or suicide attempts 
(particularly in psychiatric populations) but may be of 
limited value in predicting completed suicide. 

Large scale pharmacoepidemiologic studies generally 
focus on suicide attempts or acts of deliberate self-harm, 
though in some cases they also include a small number 
of completed suicides, particularly in countries where 
national death registries are linkable to health services 
utilization data, which is generally not true in the USA. 
These observational studies often suffer from selection 
bias that can result in 'confounding by indication' and 
other problems which limit our ability to draw causal 
inferences. For example, patients with depression 
have both an increased risk of suicidal behavior and 
an increased likelihood of taking antidepressant 
medications; hence the appearance of an association 
between taking antidepressants and suicidal events 
that is invariably found is confounded by the indication 
for the use of antidepressants, namely depression. 
While antidepressants may increase risk of suicidal 
events, suicidal events definitely increase the 
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likelihood of antidepressant treatment. As shown by 
Simon and colleagues, 1141 the greatest risk of suicidal 
behavior is in the month prior to treatment initiation. 
The same confounding by indication problem exists for 
the purported role of anti-smoking medications and 
anti-epileptic medications in suicidal behavior: patients 
with psychiatric illness have elevated rates of smoking 
so they are more likely to use anti-smoking medications 
and several anti-epileptic medications are often 
prescribed as adjunctive treatment of bipolar disorder. 

Selection effects are generally eliminated in RCTs, 
but RCTs are not without their own set of problems 
which limit inferences. As discussed above, given their 
limited size, RCTs are generally only able to examine 
suicidal ideation, which may tell us little about suicide 
risk. Traditionally, RCTs have not been designed to 
examine suicide risk; they are usually focused on 
retrospective spontaneous reports of suicidal thoughts 
and behaviors of study participants. Such data are 
subject to ascertainment bias' 151 in which the method of 
eliciting the suicidal information can result in apparent 
differences in the rates of these events between 
treated subjects and untreated controls. For example, 
patients randomized to active medication will have 
more side-effects in general than patients randomized 
to placebo; this will result in greater contact with study 
staff and more opportunity to report suicidal thoughts 
and behavior. Similarly, suicide attempts in which the 
individual ingests the study medication will result 
in increased likelihood of detection among actively 
treated subjects because overdose of active medication 
(e.g., an antidepressant) will have a greater likelihood 
of emergency room contact than overdose of an inert 
placebo. 

3. What do we know about suicide and antidepressants? 

One of the greatest recent controversies in the safety 
of pharmaceuticals is the question of whether certain 
classes of medications (e.g., antidepressants) increase 
the risk of suicidal thoughts, behavior, and completion. In 
2004, the US Food and Drug Administration (FDA) placed 
a black-box warning on all antidepressants because of 
concern that such medications increased risk of suicidal 
thoughts and behavior in children and, in 2006, extended 
the warning to young adults. These warnings are not 
limited to antidepressants, but have also been placed 
on anti-epileptics, smoking cessation drugs (varenicline), 
acne medications such as isotretinoin, beta blockers, 
reserpine and drugs for weight loss. 1161 This topic was 
discussed in a recent paper in the Shanghai Archives of 
Psychiatry. [17] A recent review by Gibbons and Mann [18] 
provides a detailed summary of the recent research 



about the relationship between medication use and 
suicide. 

Questions regarding a possible relationship between 
antidepressants and suicide emerged in 1990 with the 
publication of a series of case reports in which the then 
newly introduced selective serotonin reuptake inhibitors 
(SSRIs) were associated with the apparent emergence 
of suicidal thoughts and behavior. 1191 These early 
observations led to US FDA hearings in 1991 that did 
not find evidence of an increased risk of suicidal acts 
associated with antidepressants. Theseearlycasestudies 
set the stage for the development of new approaches 
to the analysis of pharmacovigilance data in general 
and with respect to suicide in particular. Attention to 
the potential relationship between antidepressants 
and suicide led to a US black-box warning for children 
under 18 years of age in October 2004. The evidence 
supporting the warning was a meta-analysis conducted 
by the FDA, 1201 which combined spontaneous reports 
of suicidal thoughts and behaviors from 25 placebo- 
controlled pediatric RCTs of newer antidepressant 
medications. The conclusion was that higher rates of 
self-reported suicidal ideation and behavior occurred 
in children treated with antidepressants than in those 
receiving placebo (OR=1.78; 95% Cl=1.14, 2.77). The 
FDA also presented results of an analysis of prospective 
data (based on a suicidal ideation or behavior rating- 
scale item), which showed no effect of antidepressant 
use on the emergence or worsening of suicidal thoughts 
and behaviors (OR=0.92; Cl=0.76, 1.11). The difference 
between prospective clinician ratings and spontaneous 
patient reports of suicidal ideation and behavior has 
never been adequately explained; it may be due to 
ascertainment bias between active treatment and 
placebo groups. 

In January 2006, the FDA conducted a second meta- 
analysis' 211 of 372 RCTs of newer antidepressants in 
adults with a pooled sample of approximately 100 000 
individuals. The analysis was based solely on spontaneous 
adverse event reports from these RCTs; no data on 
prospective clinician ratings were provided in the studies. 
While the overall analysis revealed no evidence of an 
association, stratification by age revealed that for the 
primary endpoint of suicidal ideation or behavior, 18- to 
24-year-olds taking antidepressants had an increased 
risk compared to those taking placebo that approached 
statistical significance (OR=1.62; Cl=0.97, 2.71). However, 
adults aged 25 to 64 years had a significantly decreased 
risk (OR=0.79; Cl=0.64, 0.98), and geriatric patients had a 
markedly significantly decreased risk (OR=0.37; Cl=0.18, 
0.76). On the basis of these results, the FDA extended the 
black-box warning to include 18- to 24-year-olds. 
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Since the FDA warnings, several studies have 
raised serious questions regarding the results of 
the FDA analyses. Bridge and colleagues 1221 analyzed 
an expanded set (27 studies) of pediatric RCTs of 
antidepressant treatment and suicidality; they found 
that the association between antidepressant treatment 
and suicidality was much weaker than reported in 
the FDA's original findings. Gibbons and colleagues 1231 
studied a cohort of 226866 veterans with a new episode 
of major depressive disorder and found a significantly lower 
rate of suicide attempt in those treated with monotherapy 
SSRIs compared with those treated without antidepressant 
medication (123/100 000 for SSRIs versus 335/100 000 
for no antidepressant; OR 0.37; p<0.0001). Moreover, 
among veterans treated with monotherapy SSRIs the rate 
of suicide attempts after treatment (123/100 000) was 
significantly lower than the rate before treatment (221/ 
100000; relative risk 0.56; p<0.0001). Analyses stratified 
by age did not confirm the FDA's findings of increased 
suicidality for 18- to 24-year-olds. The veterans 
data have also been re-analyzed using person-time 
logistic regression. 1241 This analysis found a significant 
decrease in suicide attempt rate during monotherapy 
SSRI treatment (hazard ratio [HR], 0.17; Cl=0.10, 0.28; 
p=0.0001); the suicide attempt rate decreased with 
time from the index episode and the hazard rate is 
much lower for patients treated with monotherapy 
SSRIs (versus non-pharmacological treatments) during 
the first few months following treatment initiation, but 
the difference between the different treatment groups 
becomes indistinguishable by 9 months following the 
index episode. 

Ecological studies conducted following the FDA's 
black-box warning revealed that there may have been 
unintended consequences of the warning. Several 
authors 125 281 have now shown that antidepressant 
prescription rates precipitously dropped following 
the warning. Both Gibbons and colleagues 1261 and 
the US Centers for Disease Control and Prevention 1291 
documented a 14% increase in child and adolescent 
suicide rates following the decrease in antidepressant 
prescriptions. Libby and colleagues 130 - 311 found a 
44% reduction in the diagnosis of new cases of child 
depression among general practitioners following the 
black-box warning and a 37% reduction in the diagnosis 
of new cases among young adults. 

Recently, Gibbons and colleagues 132 - 331 synthesized 
all the longitudinal data from 40 drug company 
sponsored and one large National Institute of Mental 
Health placebo-controlled RCTs of fluoxetine for youth, 
adults and the elderly, and of venlafaxine in adults. Both 
drugs were shown to be efficacious in all age cohorts 



although the maximum benefit was observed for 
children and only marginal benefit was observed for the 
elderly following six weeks of treatment. With respect 
to suicidal thoughts and behavior, significant benefits of 
antidepressant treatment were observed in adults and 
the elderly, and these benefits were mediated by larger 
decreases in depressive severity observed in treated 
patients relative to placebo controls. In children, 
despite statistically and clinically significant benefits in 
terms of depression observed with active treatment, 
no significant difference between treated and control 
patients was observed in the rates of suicidal ideation 
and behavior. These results indicate that suicidal 
thoughts and behavior are driven by depression in 
adults but this does not appear to be the case for 
children. This finding is consistent with a recent finding 
by Kessler and colleagues 1341 who found that over 80% 
of suicidal adolescents received some form of mental 
health treatment, but the treatment failed to prevent 
suicidal behavior. 



4. Are there more effective methods for measuring 
suicide risk? 

As noted, the use of spontaneously reported 
retrospective accounts of suicidal thoughts and behavior 
even in the context of RCTs can lead to invalid statistical 
inferences. Previously, prospective measurements 
of suicidality were usually based on ratings of a single 
symptom item that has response categories ranging 
from suicidal thoughts to planning to behavior. Recently, 
the US FDA 1161 has endorsed use of the Columbia-Suicide 
Severity Rating Scale (C-SSRS) 1351 for routine prospective 
assessment of suicidal risk in RCTs involving any central 
nervous system related drug. The C-SSRS provides direct 
classification of suicidal events into 11 categories, 5 of 
which concern suicidal ideation (ranging from passive 
thoughts to active ideation including method, intent 
and planning), 5 suicidal behaviors (ranging from 
preparatory actions to completed suicide), and self- 
injurious behavior with no suicidal intent. The advantage 
of the C-SSRS is that it standardizes what we mean by 
suicidal events and eliminates the ascertainment bias 
that can be produced by spontaneous reports when 
comparing patients receiving an active treatment versus 
a pharmacologically inactive control. This is an important 
advance for RCTs in which suicide is an adverse event 
of concern, and it will be of considerable interest to 
examine the association between antidepressant 
treatment and suicidal events in youth as more data 
using the C-SSRS become available. 

Identification of individuals with significant suicidal 
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ideation or those who have already made a serious 
attempt may be too late for the purpose of prevention. 
111 In adults and the elderly, we know that depressive 
severity is an important mediator of suicidal thoughts 
and behaviors and therefore the ability to more widely 
and less invasively measure depression and screen for 
suicidal risk is still sorely needed. This is particularly 
true in high-risk populations such as veterans of military 
actions who in the US are at greater risk of death by 
suicide than death from a battle-related injury. Recently 
Gibbons and colleagues' 361 developed a computerized 
adaptive test of depressive severity (the CAT-Depression 
Inventory, CAT-DI) that can be self-administered in two 
minutes, requires an average of 12 items per subject 
yet maintains a correlation of 0.95 with the total item 
bank score based on almost 400 items. Using a simple 
empirically derived threshold, the test has a sensitivity 
of 0.92 and a specificity of 0.88 for identifying a major 
depressive disorder (using the diagnosis derived by a 
clinician using the Structured Clinical Interview for 
DSM-IV as the gold standard). The test is based on 
multidimensional item response theory (Ml RT) 137381 
and one of the subdomains includes 14 suicide items. 
In the event that a suicide item is not administered as 
a part of the adaptive test, 1 to 4 additional suicide 
screening items are administered and if any item is 
endorsed at a moderate level or above, a suicide alert is 
sent to the treating clinician or managed care provider. 
The advantage of an adaptive self-report measure of 
depressive severity and suicidal risk is that it can be 
administered to large populations via the internet from 
a cloud computing environment. Furthermore, unlike 
traditional brief, fixed-length instruments such as the 
PHQ-9 (Patient Health Questionnaire), which involve 
repeatedly administering the same set of items (which 
can result in response set bias), the CAT-DI adapts to 
changes in depressive severity within individuals and 
asks different questions depending on the current level 
of impairment. Reduction in respondent burden is 
achieved by initiating the next CAT testing session based 
on the estimated depressive severity from the previous 
session and, thus, reducing the number of items that 
need to be administered. Another advantage of CAT 
is that the termination criterion (which determines 
the required level of precision of the estimate and is 
inversely proportional to the number of items required) 
can be different for different applications. For example, 
in an RCT we may want extremely precise estimates 
that will enable us to obtain the most accurate estimate 
of a treatment effect of interest and will, thus, require 
a larger number of items (e.g., 20-30). In primary care, 
we may require a somewhat less precise estimate which 
is sufficient to detect depression when present and 



monitor the effectiveness of treatment so it will require 
an intermediate number of items (e.g., 10-12). In 
psychiatric epidemiology, we may require a less precise 
estimate based on fewer items (e.g., 5 or 6) that is 
sufficient for determining the prevalence of depression 
within a specified population. All that is required is 
to change the termination criterion (i.e. the required 
standard error of the severity level estimate) depending 
on the requirements of the specific application. The 
paradigm shift is from a traditional fixed-length test 
that has a small number of items and may result in 
variable measurement precision, to a variable length 
test with a small but optimally selected set of items 
for the specific respondent and leads to constant 
measurement precision across individuals. Additional 
CATs for anxiety, hypomania/mania spectrum and a 
brief depression diagnostic screening test have also 
been developed using this methodology. 

5. What improvements in statistical methodologies 
are possible for the study of suicide? 

From a statistical perspective, the analysis of suicide 
and related events are among the most challenging and 
interesting drug safety problems. There is no other 
area where the indication for treatment is so strongly 
confounded with the adverse event of interest. Even 
in well-controlled observational studies, selection 
effects can lead to severely biased results. Since suicide 
events are rare, RCTs in and of themselves generally 
have sample sizes that are too small to draw valid 
inferences. Furthermore, patients enrolled in RCTs may 
have little resemblance to those patients who are the 
ultimate consumers of the medications of interest. In 
the following, I provide a brief overview of several areas 
of promising statistical research. 

5.1 Meta-analysis 

Most meta-analyses of rare binary events in medical 
research (including suicidal events) are based on the 
fixed-effect model or 'Mantel-Haenszel Method' or the 
random-effect model of DerSimonian and Laird. 1391 The 
fixed-effect model assumes that the treatment effect 
is constant over studies and the random-effect model 
allows the treatment effect to vary from study to study. 
Recently, Bhaumik and colleagues 1401 studied these 
estimators and found that the estimated treatment 
effect can be grossly over-estimated when there is 
significant variability in the treatment effect across 
studies. The bias is smaller for the random-effect model 
than for the fixed-effect model, but still appreciable. 
These estimators also require a continuity correction to 
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zero cells from a given trial and if the number of events 
in both arms is zero, then the study must be removed 
from the computation. Alternative methods based on 
non-linear mixed-effects regression models 1411 do not 
require continuity corrections or removal of zero-event 
studies and do not suffer from bias due to treatment 
effect heterogeneity across studies. The disadvantage 
of these more advanced meta-analysis procedures is 
that the results are dependent on the particular model 
specification (random background event rate, random 
treatment effect, both random effects and their 
correlation). While the correct model specification is 
an empirical question, a model with two correlated 
random effects (random background incidence and 
random treatment effect) generally works well in all 
cases. 1421 

While meta-analysis combines effect sizes such as 
standardized mean differences or odds ratios, 'research 
synthesis' provides a re-analysis of the complete set of 
person-level longitudinal data from each study. As an ex- 
ample, the previously discussed papers by Gibbons and 
colleagues 132331 performed 3-level linear (efficacy) and 
non-linear (safety) mixed-effects regression analyses' 371 
of the data from a series of 41 RCTs on the efficacy and 
safety of antidepressants. In these analyses, the inter- 
cept and slope of the temporal trends in efficacy and 
safety measures are allowed to vary from individual to 
individual and the study means of these same effects are 
allowed to vary from study to study. With proper specifi- 
cation of the variance component structure, the overall 
pooled estimate of the treatment by time interaction 
tests the overall efficacy (or safety) of the medication of 
interest. 

5.2 Person-time models 

Person-time regression or discrete-time survival 
analysis 1431 is an ingenious approach to fitting a time to 
event or survival analytic model in a parametric way 
using standard logistic regression software. The basic 
idea is to discretize time into a set of smaller intervals 
and to then record the number of subjects at risk in 
each interval, the number experiencing the event (e.g., 
suicide attempt), and the number censored. A similar 
approach can be taken using unstructured data in 
which each subject contributes n. records either to the 
point in time in which the event was first experienced 
or to the end of the follow-up period. Advantages of 
the approach are that: (a) time-varying predictors 
are easily accommodated, (b) random-effects such as 
the nesting of patients within hospitals or clinics, or 
counties can be easily included, (c) competing risks 
such as death by suicide or other causes of mortality 



can be examined, and (d) non-proportional hazard 
models can be estimated. 141,441 The net result is that 
we can relax the assumption that once a subject is 
exposed they are always exposed and replace it with 
any exposure pattern (e.g., monthly) and produce a 
within-subject estimate of the effect of the exposure 
on the probability of experiencing the adverse event. 
Note that unlike a traditional mixed-effects logistic 
regression for a repeated binary event, these models 
are restricted to a single event per person and as 
such, the repeated observations within individuals are 
conditionally independent. 1431 

5.3 Causal Inference 

Since larger sample sizes are required to study 
events such as suicide attempts, this generally leads to 
large-scale pharmacoepidemiologic studies of medical 
claims data, which sufferfrom the usual problems asso- 
ciated with the analysis of observational data. To insu- 
late inferences from bias produced by the selection of 
patients to treatments (either self-selected or selected 
by their treating physician based on observable charac- 
teristics such as severity of illness) we turn to methods 
designed to draw causal inferences from observation- 
al studies. The now classic approach is based on pro- 
pensity score matching 145,461 in which patients who do 
or do not receive a particular treatment of interest are 
matched on a large number of potential confounders 
(e.g., age, sex, concomitant treatments, comorbid diag- 
noses, prior attempts) and the likelihood of receiving 
treatment (e.g., an antidepressant). The fundamental 
idea is to carve a RCT out of an observational study, 
without eliminating so much of the data that the 'RCT' 
is no longer generalizable. 

While propensity score matching is useful 
conceptually, drug exposures are typically dynamic 
and the exposure status takes on different values 
over time. Traditional propensity score matching 
assumes that treatment status does not change over 
time. While some work has been done in the area of 
dynamic propensity score matching, 1471 an equally if 
not more promising approach for dynamic treatment 
exposures is based on the idea of marginal structural 
models (MSM).' 481 The basic idea of MSM is that we 
compute the probability of treatment at each of T 
time-points and then combine these probabilities to 
compute the likelihood of treatment up to a particular 
point in time. These probabilities are then standardized 
and used as weights in a second stage regression 
that models the dynamic effects of treatment on 
the adverse event of interest (e.g., suicide attempt) 
weighted by the likelihood to receive treatment at any 
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particular point in time. While the traditional approach 
described by Robins and co-workers rests strongly on 
the assumption that all of the important confounders 
have been measured and are available for the analysis, 
the analysis may be further expanded to include the 
effects of unmeasured confounders by adding one or 
more random effects to the treatment selection model 
as described by Leon and Hedeker 1491 in the context of 
computing dynamic propensity score adjustments. 

6. Where do we go from here? 

The area of drug safety in general and suicide in 
particular is an enormously important problem that 
has traditionally been investigated using quite simple 
approaches which often yield questionable results. 
Improving the quality of analytic work in this important 
area should be a major goal of future applications. 
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ERRATUM 



In the February 2013 issue, there were two errors on the right column of page 56 of the Biostatistics in Psychiatry 
(13) article. (Le Cook B, Manning WG. Thinking beyond the mean: a practical guide for using quantile regression 
methods for health services research. Shanghai Archives of Psychiatry 2013; 25(1): 55-59.) The phrase '...a 75th 
quantile regression fits a regression line through the data so that 90 percent of the observations...' should read: 
'...a 75th quantile regression fits a regression line through the data so that 75 percent of the observations...' And 
the phrase '...and the observed values above the line (positive residuals) by 1.75.' should read: '...and the observed 
values above the line (positive residuals) by 1.5.' We apologize for the errors. 



